Detecting High Obfuscation Plagiarism: Exploring Multi-Features Fusion via Machine Learning

نویسندگان

  • Leilei Kong
  • Zhimao Lu
  • Haoliang Qi
  • Zhongyuan Han
چکیده

Providing effective methods of identification of high-obfuscation plagiarism seeds presents a significant research problem in the field of plagiarism detection. The conventional methods of plagiarism detection are based on single type of features to capture plagiarism seeds. But for high-obfuscation plagiarism detection, these single type features are not sufficient for identifying the plagiarism seeds effectively because of the varied plagiarism methods used in high-obfuscation plagiarism. This paper presents a multi-features fusion method for the highobfuscation plagiarism seeds identification. This method exploits Logical Regression model to integrate lexicon features, syntax features, semantics features and structure features which extracted from suspicious document and source document. A multi-feature fusion classifier based on Logical Regression model is proposed to decide whether a text fragment pair can be regarded as plagiarism seeds or not. Experimental results on the PAN@CLEF2013 summary-obfuscation corpus show that the fusion of different types of features produces more accurate results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing new features of infected web content in detection of malicious web pages

Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...

متن کامل

A Text Alignment Algorithm Based on Prediction of Obfuscation Types Using SVM Neural Network

In this paper, we describe our text alignment algorithm that achieved the first rank in Persian Plagdet 2016 competition. The Persian Plagdet corpus includes several obfuscation strategies. Information about the type of obfuscation helps plagiarism detection systems to use their most suitable algorithm for each type. For this purpose, we use SVM neural network for classification of documents ac...

متن کامل

Machine Translation Evaluation Metric for Text Alignment

As plagiarisers become cleverer, plagiarism detection becomes harder. Plagiarisers will find new ways to obfuscate the plagiarized passages so that humans and automatic plagiarism detectors are not able to point them out. So, a plagiarism detection system needs to be robust enough to detect plagiarism, no matter what obfuscation techniques have been applied. Our system attempts to do the same b...

متن کامل

COAT: Code ObfuscAtion Tool to evaluate the performance of code plagiarism detection tools

There exist many plagiarism detection tools to uncover plagiarized codes by analyzing the similarity of source codes. To measure how reliable those plagiarism detection tools are, we developed a tool named Code ObfuscAtion Tool (COAT) that takes a program source code as input and produces another source code that is exactly equivalent to the input source code in their functional behaviors but w...

متن کامل

A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection

Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014